Transformation-based Probabilistic Clustering with Supervision
نویسندگان
چکیده
One of the common problems with clustering is that the generated clusters often do not match user expectations. This paper proposes a novel probabilistic framework that exploits supervised information in a discriminative and transferable manner to generate better clustering of unlabeled data. The supervision is provided by revealing the cluster assignments for some subset of the ground truth clusters and is used to learn a transformation of the data such that labeled instances form well-separated clusters with respect to the given clustering objective. This estimated transformation function enables us to fold the remaining unlabeled data into a space where new clusters hopefully match user expectations. While our framework is general, in this paper, we focus on its application to Gaussian and von MisesFisher mixture models. Extensive testing on 23 data sets across several application domains revealed substantial improvement in performance over competing methods.
منابع مشابه
Supplementary Material for Transformation-Based Probabilistic Clustering using Supervision
6 Datasets 10 6.1 Time-series datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.2 Text data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.3 Handwriting Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.4 Face Recognition . . . . . . . . . . . . . . . . ...
متن کاملProbabilistic Semi-Supervised Clustering with Constraints
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn th...
متن کاملApplication of Probabilistic Clustering Algorithms to Determine Mineralization Areas in Regional-Scale Exploration Studies
In this work, we aim to identify the mineralization areas for the next exploration phases. Thus, the probabilistic clustering algorithms due to the use of appropriate measures, the possibility of working with datasets with missing values, and the lack of trapping in local optimal are used to determine the multi-element geochemical anomalies. Four probabilistic clustering algorithms, namely PHC,...
متن کاملپخش بار احتمالاتی با استفاده از تبدیل بی بوی کروی
Today's with the increasing development of distributed energy resources, power system analysis has been entered a new level of attention. Since the majority of these types of energy resources are affected by environmental conditions, the uncertainty in the power system has been expanded; so probabilistic analysis has become more important. Among the various methods of probabilistic analysis, po...
متن کاملA Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EMstyle clustering algorithm, the E-step of which requires collective assignment of...
متن کامل